Performance Comparison of Asynchronous Transfer Configurations for UHD Game Image Compression with GPGPU
نویسنده
چکیده
Ultra high definition (UHD) game scenes have caused the memory bandwidth problem. The lossless DPCM-GR based compression algorithm [12] using NVIDIA CUDA(Compute Unified Device Architecture) like general purpose GPU (GPGPU) computing relieves the bandwidth problem without sacrificing image quality, which supports bit parallel pipelining. This paper increases the memory bandwidth efficiency using the shared memory of CUDA based on the compression algorithm [12]. Also, various asynchronous transfer configurations which can overlap the kernel execution and data transfer between the Host and the CUDA device are implemented with the pagelocked host memory. Experimental results show that GPGPU CUDA computing obtains the maximum 87.5 and 30.6 times speedups for GTX650Ti and GT330, respectively, comparing to Host CPU. Also, the maximum reductions of the compression time for GTX650Ti and GT330 are 54.1% and 30.3%, respectively, among various concurrency transfer configurations.
منابع مشابه
GPU Computing to Improve Game Engine Performance
Although the graphics processing unit (GPU) was originally designed to accelerate the image creation for output to display, today’s general purpose GPU (GPGPU) computing offers unprecedented performance by offloading computing-intensive portions of the application to the GPGPU, while running the remainder of the code on the central processing unit (CPU). The highly parallel structure of a many ...
متن کاملفشردهسازی تصویر با کمک حذف و کدگذاری هوشمندانه اطلاعات تصویر و بازسازی آن با استفاده از الگوریتم های ترمیم تصویر
Compression can be done by lossy or lossless methods. The lossy methods have been used more widely than the lossless compression. Although, many methods for image compression have been proposed yet, the methods using intelligent skipping proper to the visual models has not been considered in the literature. Image inpainting refers to the application of sophisticated algorithms to replace lost o...
متن کاملPaper Title (use style: paper title)
Modern GPUs can accelerate the execution of inherently data-parallel applications in General-Purpose GPU (GPGPU) computing. However, real-world algorithm realizations on GPUs differ from synthetic benchmarks. In this poster we stress the performance limits of GPUs. We measure the performance of our CUDA implementation of a recommended image compression algorithm (CCSDS-122.0-B-1) for space data...
متن کاملComparison of Performances for Air-Standard Atkinson and Dual Combustion Cycles with Heat Transfer Considerations
There are heat losses during the cycle of real engine that are neglected in ideal air-standard analysis. In this paper, the effect of heat transfer on the net output work is shown and thermal efficiency of the air-standard Atkinson and the Dual combustion cycles are analyzed. Comparison of performances of the air-standard Atkinson and the Dual combustion cycles with heat transfer considerations...
متن کاملA complete and efficient CUDA-sharing solution for HPC clusters
In this paper we detail the key features, architectural design, and implementation of rCUDA, an advanced framework to enable remote and transparent GPGPU acceleration in HPC clusters. rCUDA allows decoupling GPUs from nodes, forming pools of shared accelerators, which brings enhanced flexibility to cluster configurations. This opens the door to configurations with fewer accelerators than nodes,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016